Generation of Synthetic Sonic Log Data Using Random Forest Algorithm at the Lagoa Parda Field in Espirito Santo, Brasil

1.1. Background

Well logs are interpreted/processed to estimate the in-situ petrophysical and geomechanical properties, which is essential for subsurface characterization. Various types of logs exist, and each provides distinct information about subsurface properties. Certain well logs, like gamma ray (GR), resistivity, density, and neutron logs, are considered as “easy-to-acquire” conventional well logs that are run in most of the wells. Other well logs, like nuclear magnetic resonance, dielectric dispersion, elemental spectroscopy, and sometimes sonic logs, are only run in limited number of wells.

Sonic travel-time logs contain critical geomechanical information for subsurface characterization around the wellbore. Often, sonic logs are required to complete the well-seismic tie workflow or geomechanical properties prediction. When sonic logs are absent in a well or an interval, a common practice is to synthesize them based on its neighboring wells that have sonic logs. This is referred to as sonic log synthesis or pseudo sonic log generation.

1.2. Problem Statement

Compressional travel-time (DT) logs are not acquired in all the wells drilled in a field due to financial or operational constraints. Under such circumstances, machine learning techniques can be used to predict DT logs to improve subsurface characterization. The goal of the study is to develop data-driven models by processing “easy-to-acquire” conventional logs from a list of weels, and use the data-driven models to generate synthetic compressional logs (DT) in rest of Wells. A robust data-driven model for the desired sonic-log synthesis will result in low prediction errors, which can be quantified in terms of Root Mean Squared Error by comparing the synthesized and the original DT logs.

Our goal is to build a generalizable data-driven models. Following that, the program deploy the newly developed data-driven models on test dataset to predict DT logs. The data-driven model should use feature sets derived from the following 6 logs: NPHI, GR, CALI, DEPT, RHOB, ILD. The data-driven model should synthesize the target log: DT.

1.3. Data Decription

1.4. Evaluation Metric

We will be evaluated by the metirc Root Mean Squared Error and r².

The RMSE is calculated as:

RMSE = $\sqrt{\frac{1}{n}\Sigma_{i=1}^{n}{\Big(\frac{d_i -f_i}{\sigma_i}\Big)^2}}$

Where:

R² (Variance Explained) is calculated as:

$R^2 = \frac {{SS}_{regression}}{{SS}_{total}} = 1 - \frac{\sum_{i}({y}_{i} - \hat{y}_{i})^2}{\sum_{i}({y}_{i} - \bar{y}_{i})^2}$

Where:

DT are in the same weight during the evaluation

Understanding and optimizing your predictions for this evaluation metric is paramount for this inference.

1.5. Base Line

Faust's Equation = $\frac{1000}{({2*DEPT*ILD})^{\frac{1}{6}}}$

Up

2. Imports

Up

3. Methods

4. Read Database

Our model proposes that the inference is performed in wells where the DT was not recorded. Furthermore, in order to syntesise the Sonic Curve we need to split our dataset betwin those how have and do not have DT. We want to do this before any substantial visualizations that way we can avoid biases inherent to the visualization process.

Up

5. Data Cleaning

For the application of our model, it is essential that the submitted database has all the selected minemonics. Events without data will be excluded from this analysis.

In order to clean the data we disregarded caliper log outliers, ILD outliers and events where DT is over 175 us/ft , which provide conditions of borehole such as mud cake or washing-out.

Up

6. Data Visualization

Scatter Plot

Histograms

Up

7. Build Machine Learning Models

Data preparation is oftentimes the most time-consuming step of the modeling process. It is also one of the most important with model accuracy often contingent on the quality of data inserted. To this end, we'll be applying the following transformations on this data, not in this particular order:

$ z =\frac{x_i-\mu}{\sigma} $

Data Pipeline:

Random Forest - Cross Validation

Up

8. References:

https://github.com/pddasig/Machine-Learning-Competition-2020/blob/master/Synthetic%20Sonic%20Log%20Generation%20Starter_Yu%202_27_2020.ipynb

https://github.com/andymcdgeo/Petrophysics-Python-Series/blob/master/14%20-%20Displaying%20Lithology%20Data.ipynb

https://github.com/andymcdgeo/Petrophysics-Python-Series/blob/master/05%20-%20Petrophysical%20Calculations.ipynb

https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html